Deciding between man and zone coverage is one of the most critical strategic choices a defensive coordinator must make before each offensive play in American football. While experienced offensive coordinators and quarterbacks often rely on visual cues to identify these defensive schemes, the increasing availability of player tracking data offers a new avenue to uncover and analyze these tactics. A notable example is Amazon’s NFL Next Gen Stats model, which delivers coverage predictions during live broadcasts (see a snapshot of the 2024 Week 12 matchup between the Pittsburgh Steelers and Cleveland Browns). However, these models seem to be trained on plays without pre-snap motion, or at least to the situations before motions (see Amazon), while motion is a crucial element of modern offensive strategies.
Our project takes this model a step further. While we similarly predict man- or zone coverage when the teams are set before snaps, we further leverage the additional information of pre-snap player movements. Using a hidden Markov model (HMM), we model defenders’ trajectories based on hidden states, which represent the offensive players they may be guarding. Incorporating summary statistics of the probabilistic HMM results as features into the existing pre-motion model significantly improves both the AUC and detection accuracy and further allows for evaluating the effectiveness of pre-snap motion in uncovering defensive strategies, providing real-time tactical insights for coaches.
We analyze tracking data from nine weeks of the NFL 2022 season, provided by the NFL Big Data Bowl 2024. Beside the tracking data, we also use information on plays and players. We further considered the corresponding data from PFF that assigned the categories , and representing the different schemes to each play. As it is not properly described what means, we omit every play that is associated with this value. Moreover, we omit plays with more than five offensive linemen and with two quarterbacks. Since we are specifically interested in analyzing pre-snap player movements, we omit plays that did not contain any pre-snap motion. Then, we end up with \(3985\) offensive plays in total, from which the defense played \(2973\) in zone and \(1012\) in man coverage.
To accurately forecast the defensive scheme (man- or zone defense) for every play, we need to create various features derived from the tracking data. In particular, we conducted the following feature engineering steps: We first consider all 11 players on each side of the field and compute features related to the convex hull of the positions of the players. In particular, for defense and offense, we compute the area spanned by the convex hull of all player such as well as the largest \(y\) distance (i.e. the width of the hull) and the largest \(x\) distance (i.e. the length of the hull). In addition, we select the five most relevant players on each side of the field. For offense, we omit the offensive line and the QB, while, for defense, we omit nose tackles, defensive tackles and defensive ends, and select the five defenders that were the closest to the five attackers corresponding to a weighted euclidean distance, putting much more emphasis on the y-axis. Finally, we use their standardized \(x\) and \(y\) coordinates as features and order defensive and offensive players according to their \(y\) coordinates. Additionally, for each of the relevant defenders, we compute distances to the football and their orientation with respect to the quarterback (values taken at event lineset. Rouven: Die Values haben wir doch rausgenommen aus dem finalen Modell oder? Robert: ich denke das sollten wir noch besprechen. Ich habe ohnehin mehrere Modelle gefitted…). Finally, we extract relevant information from play-by-play data, such as quarter, down, yards to go, home and away score and the remaining game time in the current half (in seconds).
Detailed information can be found in the Appendix.
New Structure still in developement
We train different models to predict whether the defense plays a man- or zone coverage scheme. Since the aim of the project is to show the effectiveness of pre-snap motion, we follow a 3 step strategy:
In general, we have a limited dataset available (only 3985 plays) and
therefore need to balance the complexity of the model, i.e. we want to
control the number of features available. Using the previously described
features, we obtain 67 variables basic features from tracking and play
by play data, which can be used for all 3 models above. With a more
profound dataset, using all of these features (and possibly more)
certainly makes sense. However given the small dataset, we focus on only
32 basic features: 6 convex hull related features (3 for offense, 3 for
defense), 20 player positions features (10 standardized \(x\) and \(y\) coordinates, 5 for offense, 5 for
defense), and 6 play by play features. In the appendix, we provide more
details and analyses using more features. Finally we need a suitable
model class for predicting man vs zone coverage and chose the following
two: On the one hand, we fit a glmnet model, which performs
implicit feature selection and is able to handle multicollinearity. On
the other hand we use an xgboost model, which is able to
capture non linear effects (and interactions) and in general also
handles collinearity, but needs careful hyperparameter tuning and
generally performs better on bigger data sets. For all of the model we
use careful 10-fold cross validation on a suitable hyperparameter
grid.
We fit a model with the previously described basic features, all of these features are derived at the time of line-set, so this model does not use motion information. This very basic model allows us to establish a baseline, that allows us to measure the effect of including motion features accurately.
We augment our basic pre-motion model with naive post-motion features. In order to maintain a manageable complexity, we only derive 6 additional post-motion features: for each team (offense and defense), we derive the maximum \(y\) distance, the maximum \(x\) distance, and the total distance traveled by this team up until the snap.